Association between early preterm birth and maternal exposure to fine particular matter (PM10): A nation-wide population-based cohort study using machine learning

Although preterm birth (PTB), a birth before 34 weeks of gestation accounts for only less than 3% of total births, it is a critical cause of various perinatal morbidity and mortality. Several studies have been conducted on the association between maternal exposure to PM and PTB, but the results were inconsistent. Moreover, no study has analyzed the risk of PM on PTB among women with cardiovascular diseases, even though those were thought to be highly susceptible to PM considering the cardiovascular effect of PM. Therefore, we aimed to evaluate the effect of PM10 on early PTB according to the period of exposure, using machine learning with data from Korea National Health Insurance Service (KNHI) claims. Furthermore, we conducted subgroup analysis to compare the risk of PM on early PTB among pregnant women with cardiovascular diseases and those without. A total of 149,643 primiparous singleton women aged 25 to 40 years who delivered babies in 2017 were included. Random forest feature importance and SHAP (Shapley additive explanations) value were used to identify the effect of PM10 on early PTB in comparison with other well-known contributing factors of PTB. AUC and accuracy of PTB prediction model using random forest were 0.9988 and 0.9984, respectively. Maternal exposure to PM10 was one of the major predictors of early PTB. PM10 concentration of 5 to 7 months before delivery, the first and early second trimester of pregnancy, ranked high in feature importance. SHAP value showed that higher PM10 concentrations before 5 to 7 months before delivery were associated with an increased risk of early PTB. The probability of early PTB was increased by 7.73%, 10.58%, or 11.11% if a variable PM10 concentration of 5, 6, or 7 months before delivery was included to the prediction model. Furthermore, women with cardiovascular diseases were more susceptible to PM10 concentration in terms of risk for early PTB than those without cardiovascular diseases. Maternal exposure to PM10 has a strong association with early PTB. In addition, in the context of PTB, pregnant women with cardiovascular diseases are a high-risk group of PM10 and the first and early second trimester is a high-risk period of PM10.


Introduction
Preterm birth (PTB), a delivery before 37 0/7 weeks of gestation, has been an unsolved major problem in obstetrics for a long time. PTB is divided into early and late PTB according to gestational age (GA). Early PTB is defined as a delivery occurring before 34 0/7 weeks of gestation and late PTB is defined as a delivery occurring between 34 0/7 and 36 6/7 weeks of gestation [1]. PTB accounts for up to 10% of total global births. Early PTB rate was about 2.8% in 2019 in United States and has not decreased over the past decades [2][3][4][5]. Although early PTB rate is relatively lower than late PTB rate, early PTB has more significant clinical impact. The mortality of infants born in early PTB period was more than 5 times higher than that of infants born in late PTB in the United States in 2018 [6]. Moreover, early PTB neonates are also at more risk of various morbidities than late PTB neonates [7]. Major complications of neonates including respiratory distress syndrome (RDS), intraventricular hemorrhage (IVH), and even long-term neurodevelopmental morbidities increase with decreasing GA [8][9][10]. For these reasons, prediction and management of early PTB have always been important issues.
Various factors associated with PTB ranging from genetic features to environmental factors have been reported [11][12][13][14]. Among various environmental factors affecting PTB, air pollution, especially exposure to fine particulate matter (PM), has drawn increasing attention in recent decades. Many studies about the association between PM and PTB was also conducted, but the results were conflicting [12][13][14][15][16][17][18][19][20][21][22]. Huynh et al. have reported that maternal exposure to PM can increase the risk of PTB while Pereira et al. could not find a significant association between the two [15,16]. Several meta-analyses have been conducted to examine the association between PTB and PM, but their results were also inconsistent [23][24][25][26]. Ju L et al demonstrated that the exposure of PM10 throughout pregnancy was associated with the increased risk of moderate PTB (delivery at 32-36 weeks of gestation) with a relative risk (RR) of 1.80 (95% confidence interval [CI]: 1.05-1.11) and very PTB (28-31 weeks of gestation) with a RR of 1.13 (95% CI: 1.06-1.21) [23]. However, Yu Z et al reported no significant association between PM10 and moderate and very PTB [24]. Therefore, the association between PM10 and PTB is not yet definitive. PM 10 which is particles with an aerodynamic diameter equal or less than 10 μm is a wellknown risk factor for cardiovascular diseases [27][28][29][30]. Several previous studies demonstrated that the cardiovascular diseases of pregnant women are associated with the increased risk of PTB [31][32][33][34]. Furthermore, the more severe cardiovascular diseases are, the greater the risk of PTB. Based on the association between PM 10 and cardiovascular diseases, it is postulated that pregnant women with cardiovascular diseases may be more susceptible to the effect of PM 10 on PTB. However, the a about the effect of PM 10 on pregnant women with cardiovascular diseases is lacking.
Therefore, this study aimed to evaluate the effect of PM 10 on early PTB compared with effects of known PTB-contributing factors by establishing a prediction model of early PTB using machine learning. In this study, we used data extracted from Korea National Health Insurance (KNHI) claims and concentration of PM 10 estimated by the national system. In addition, we compared effects of PM 10 on PTB in pregnant women with cardiovascular diseases and those without cardiovascular diseases.

Study population
This nation-wide population-based cohort study included women aged 25 to 40 years. Singleton primiparous women who delivered babies in 2017 were included. Those who had late PTB were excluded. Data were extracted from KNHI claims. In South Korea, more than 97% of total population are enrolled in KNHI. The database of KNHI contains almost all data covered by the insurance under the National Health Insurance System. KNHI claims data were provided after de-identification according to the Act on the Protection of Personal Information [35]. This retrospective cohort study was approved by the Institutional Review Board (IRB) of Korea University Anam Hospital on November 5, 2018 (2018AN0365). Informed consent was waived by the IRB.

Variables
The dependent variable was early PTB in 2017. All variables except for PM 10 were introduced according to the ICD-10 Code and procedure code (S1 Table). PM 10 concentration by region was provided by the National Ambient Air Monitoring System in South Korea. The National Ambient Air Monitoring System in South Korea consists of 505 stations covering all 162 cities, countries, and districts in the entire nation. By using the demographic information of the study population that was provided from the KNHIS database, we matched the monthly concentration of PM 10 to each participant. The missing data of PM10 concentration were imputed using median substitution of the PM10 concentration obtained from a nearby monitoring station. A total of 55 independent variables covered the following information: (1) PM 10 data in 2016 using regional PM 10 concentration matched with the residence address of study population, including PM 10 concentration data of specific month (from January 2016 to December 2016) and PM 10 concentration of each month before delivery (1 to 10 months before delivery); (2) demographic/socioeconomic determinants in 2017 including age and socioeconomic status measured by an insurance fee with the range of 0 (the lowest group) to 20 (the highest group); (3) obstetric and gynecologic diseases (namely, placenta previa, threatened abortion, incompetent internal os of cervix, gestational diabetes, hypertensive disorders during pregnancy (HDP) including gestational hypertension, preeclampsia and eclampsia, congenital malformation of uterus, pelvic inflammatory disease, vaginitis, endometriosis, abnormal menstruation, recurrent miscarriage or infertility) for any year between 2002 and 2016; (4) cardiovascular diseases (i.e., acyanotic congenital heart diseases (CHD), cyanotic CHD, arrhythmia, cardiomyopathy, congestive heart failure (CHF), ischemic heart disease (IHD), and cardiac arrest) for any year between 2002 and 2016; (5) other medical diseases, including hypertension, diabetes, hyperlipidemia, anemia, pulmonary embolism, sepsis, and stroke; and (6) medication history (that is, benzodiazepine, calcium channel blocker (CCB), nitrate, progesterone, hypnotic/sedative drug (antihistamine, zolpidem, eszopiclone, pentobarbital sodium, and benzodiazepine derivates), and tricyclic antidepressant (TCA)) in 2002-2016. Women with cardiovascular diseases were defined as women who had a history of at least one of following cardiovascular diseases: acyanotic CHD, cyanotic CHD, arrhythmia, cardiomyopathy, congestive heart failure (CHF), ischemic heart disease (IHD), and cardiac arrest. These disease data and medication history were screened using ICD-10 and ATC codes, respectively (S2 Table).
The random forest with 100 decision trees was employed in this study (100 training sets were sampled with replacements, 100 decision trees were trained with the 100 training sets, and 100 decision trees made 100 predictions). The random forest took a majority vote on the dependent variable. Data of 149,643 cases with full information were split into training and validation sets at a ratio of 80:20. Random forest feature importance was introduced for identifying major determinants of PTB and testing its associations with PM 10 concentrate, socioeconomic status, cardiovascular disease and medication history using benzodiazepine, progesterone, and tricyclic antidepressants. Subgroup analysis of pregnant women with underlying cardiovascular diseases was performed. Major determinants were defined as variables ranked as the top 50% among all variables in the early PTB prediction model. Oversampling approach was applied so that training of machine learning could be balanced between early PTB and term birth groups. Furthermore, to determine how specific variables worked in the prediction model, SHAP (Shapley Additive Explanations) value was computed. Python (CreateSpace: Scotts Valley, 2009) was employed for the analysis between December 15, 2021 and April 15, 2022.

Characteristics of study population
A total of 149,643 primiparous women were included in the final analysis. Among the study population, 3,066 (2.05%) women had early PTB and 10,953 (7.32%) women had at least one underlying cardiovascular disease. Maternal age at delivery was higher in women with early PTB than in those with term birth (32.19 years vs. 31.84 years, p < 0.0001). Most cardiovascular diseases except CHD were more common in women who had early PTB than those who had term birth. Baseline characteristics of the study population are described in Table 1. Table 2 shows monthly PM 10 concentration data (from January 2016 to December 2016) and PM 10 concentration of each month before delivery (from 1 to 10 months before delivery) in each group (term birth vs. early PTB). The concentration of PM 10 was significantly different between early PTB and term birth groups in summer and early fall (from June to September). During the period from 5 to 7 months before delivery, women who had early PTB were exposed to significantly higher concentrations of PM 10 than those who had term birth. Table 3(a) presents accuracy, sensitivity, specificity and areas under the operating-characteristic-curve (AUC) of the early PTB prediction model. With the random forest model for oversampled data, the AUC was 0.9988 and the accuracy was 0.9984. With the logistic-regression model, the AUC was 0.6787 and the accuracy was 0.5450. The performance of the random forest model was superior to the logistic regression model. The model with oversampled data showed greater AUC than that model with the original data. Therefore, we considered findings of logistic regression as supplementary findings.

Prediction model for early PTB and effect of PM 10 on PTB
Results of feature importance of major determinants of early PTB are presented in Table 4. It should be noted that most of the major determinants of early PTB for oversampling data were similar to those for original data. Socioeconomic status influenced PTB the most, followed by age at delivery. Among 27 major determinants of early PTB, PM 10 concentration of each specific month before delivery ranked within top-10 major determinants of early PTB in oversampled data. PM 10 concentration of each period before delivery (i.e., PM 10 concentrations of five months before delivery) had more impact on early PTB than PM 10 concentration of a specific month (i.e., PM 10 concentration of December). This trend was also shown in the original data. This finding implies that maternal exposure to PM 10 is associated with early PTB and that the impact of PM 10 is greater than well-known contributing factors of early PTB, such as infection (feature importance in oversampled data, PM 10 concentration in six months before delivery (0.0320) vs. pelvic inflammatory disease (0.0198) vs. vaginitis (0.0197)) (Table 4(a)). Fig 1 presents SHAP value of the prediction model which shows the sign and magnitude for the effect of a major determinant on early PTB. SHAP value of PM 10 concentration of 5 to 7 months before delivery (first and early second trimester of pregnancy) ranked high. Higher PM 10 concentration increased the risk of early PTB. The probability of early PTB was increased by 7.73%, 10.58% or 11.11% if a variable PM 10 concentration of 5, 6, or 7 months before delivery was included to the prediction model.

Effect of PM 10 on PTB in women with underlying cardiovascular diseases
Subgroup analysis of women with underlying cardiovascular diseases was conducted. Table 3 (b) presents accuracy, sensitivity, specificity and AUC of the subgroup analysis. Early PTB prediction model by random forest of oversampled data in both women with and without cardiovascular diseases also showed a fine performance. Table 4(b) presents feature importance of major determinants of early PTB in subgroup analysis. A total of 22 variables of PM 10 concentration ranked in 3 rd to 24 th of feature importance in women with cardiovascular diseases. However, 17 variables of PM 10 concentration were ranked as major determinants in women without cardiovascular diseases. The rank of PM 10 concentration was relatively lower in women without cardiovascular diseases than in those with cardiovascular diseases. This implies that women with cardiovascular diseases might be more susceptible to PM 10 concentration in terms of risk for early PTB than those without cardiovascular diseases. This trend was also observed in original data in a stronger way.

Main finding
This large population-based cohort study set the prediction model for early PTB using random forest. The AUC and accuracy of PTB prediction model using random forest were 0.9988 and 0.9984, respectively. We found that PM 10 concentration of each period before delivery was a major contributor to early PTB. We also found that the higher PM 10 concentration of 5 to 7 months before delivery increased the risk of early PTB based on the SHAP value. Furthermore, women with cardiovascular diseases were found to be more vulnerable to PM 10 concentration than those without cardiovascular diseases.

Effects of PM 10 on PTB
Although the pathophysiology of PM 10 on PTB has not yet been clearly demonstrated, PM 10 induced inflammation and oxidative stress are considered as key pathway of PM 10 causing PTB [39][40][41][42][43][44]. In addition, because PM concentration has seasonal difference which might have different effects on PTB depending on the period of exposure, some studies have analyzed the effect of PM on PTB according to the trimester of pregnancy [18][19][20][21][22]. Considering these, we analyzed the effect of PM 10 on early PTB according to the concentration of each period before delivery and the specific month which could reflect the season. The current study found that maternal exposure to PM 10 according to the period of pregnancy (PM 10 concentration of each month before delivery) was more associated with the risk of early PTB than the concentration of PM 10 itself (monthly PM 10 concentration). In addition, higher PM 10 concentration in 5 to 7 months before delivery (the first and early second trimester) was a major contributor to early PTB and associated with an increased risk of PTB. This result was consistent with previous studies showing that maternal exposure to PM 10 in first and second trimesters could significantly increase the risk of PTB [18][19][20][21][22]. Throughout the current study, we assumed that maternal exposure to PM 10 during the first and early second trimester of pregnancy might have more critical effects on PTB compared to the exposure during other periods.

Effects of PM 10 on PTB in women with cardiovascular diseases
The pathological mechanism of PM for cardiovascular diseases can be broadly divided into direct translocation and indirect pathway [45]. Direct action has a direct effect on the cardiovascular system as ultrafine particles translocates through the blood stream [45]. The indirect pathway affects cardiovascular diseases by oxidative stress and activation of the inflammation pathway [45]. Several studies have reported that pro-inflammatory cytokines are increased in subjects exposed to PM [46][47][48]. Systemic inflammatory response can promote atherosclerosis, coagulability, and endothelial dysfunction, which ultimately affects the cardiovascular system [43]. In addition, PM can stimulate the autonomic nervous system and the hypothalamic-pituitary-adrenal (HPA) axis. It is also associated with systemic inflammatory responses and atherosclerosis [49][50][51][52][53][54]. Women with cardiovascular diseases have suboptimal cardiac adaptation during pregnancy compared to healthy women. They also have more underlying cardiovascular risk factors that can increase the risk of PTB, which will increase the likelihood of PTB [31,[55][56][57][58][59]. In this study, we found that PM 10 had a relatively stronger effect on early PTB of pregnant women with cardiovascular diseases than those without cardiovascular diseases. We assumed that PM 10 exacerbate the cardiovascular function of pregnant women with underlying cardiovascular diseases, and this can further increase the risk of early PTB.

Strength and limitation
The strength of the current study was that we used large-scale population-based data and analyzed these data with machine learning, one of the optimal methods for analyzing large amounts of data. Moreover, we used various variables including demographic/socioeconomic, obstetric, and gynecologic, cardiovascular, and other medical information as confounding factors. Furthermore, we analyzed the timing and co-morbidities that might exaggerate the effect of PM 10 on early PTB. However, this study also has some limitations. First, we could not present the actual gestational age at delivery because we used original data from KNHIS claims that only provided ICD-10 code, not the actual gestational age at delivery. In addition, we could not subdivide the cause of early PTB. There are various mechanisms of early PTB including spontaneous preterm labor, severe maternal morbidity such as preeclampsia, and severe fetal morbidity such as non-reassuring fetal heart rate. However, we could not analyze the mechanism of PTB due to the lack of information in the original data. Lastly, other air pollutants such as PM 2.5 , NO 2 , and O 3 were not evaluated.

Conclusion
With this large population-based cohort study using machine learning, we found that maternal exposure to PM 10 was a major contributor to early PTB. Moreover, we found that in the context of PTB, pregnant women with cardiovascular diseases are a high-risk group of PM 10 and the first and early second trimester is a high-risk period of PM 10 . The current study emphasized the importance of PM 10 as an overlooked risk factor for PTB. We believe that these findings can alert the risk of PM 10 to both obstetricians and pregnant women, and the effort to reduce the maternal exposure to PM 10 , especially in pregnant women with cardiovascular diseases in their first and early second trimester is needed.
Supporting information S1 Table. ICD-10 Codes and procedure codes for preterm birth and cardiovascular diseases.